Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approaches
نویسندگان
چکیده
Sentence compression is a task of creating a short grammatical sentence by removing extraneous words or phrases from an original sentence while preserving its meaning. Existing methods learn statistics on trimming context-free grammar (CFG) rules. However, these methods sometimes eliminate the original meaning by incorrectly removing important parts of sentences, because trimming probabilities only depend on parents’ and daughters’ non-terminals in applied CFG rules. We apply a maximum entropy model to the above method. Our method can easily include various features, for example, other parts of a parse tree or words the sentences contain. We evaluated the method using manually compressed sentences and human judgments. We found that our method produced more grammatical and informative compressed sentences than other methods.
منابع مشابه
Learning to Parse Bilingual Sentences Using Bilingual Corpus and Monolingual CFG
Abstract We present a new method for learning to parse a bilingual sentence using Inversion Transduction Grammar trained on a parallel corpus and a monolingual treebank. The method produces a parse tree for a bilingual sentence, showing the shared syntactic structures of individual sentence and the differences of word order within a syntactic structure. The method involves estimating lexical tr...
متن کاملA Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to...
متن کاملAutomatic Synthesis of Semantics for Context-free Grammars
We are investigating the mechanical transformation of an unambiguous context-free grammar (CFG) into a deenite-clause grammar (DCG) using a nite set of examples, each of which is a pair hs; mi, where s is a sentence belonging to the language deened by the CFG and m is a semantic representation (meaning) of s. The resulting DCG would be such that it can be executed (by the interpreter of a logic...
متن کاملImproving Multi-documents Summarization by Sentence Compression based on Expanded Constituent Parse Trees
In this paper, we focus on the problem of using sentence compression techniques to improve multi-document summarization. We propose an innovative sentence compression method by considering every node in the constituent parse tree and deciding its status – remove or retain. Integer liner programming with discriminative training is used to solve the problem. Under this model, we incorporate vario...
متن کاملMachine learning of syntactic parse trees for search and classification of text
We build an open-source toolkit which implements deterministic learning to support search and text classification tasks. We extend the mechanism of logical generalization towards syntactic parse trees and attempt to detect weak semantic signals from them. Generalization of syntactic parse tree as a syntactic similarity measure is defined as the set of maximum common subtrees and performed at a ...
متن کامل